-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CLN: make CategoricalIndex._create_categorical a classmethod #21618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: make CategoricalIndex._create_categorical a classmethod #21618
Conversation
@@ -1130,7 +1130,8 @@ def to_frame(self, index=True): | |||
""" | |||
|
|||
from pandas import DataFrame | |||
result = DataFrame(self._shallow_copy(), columns=[self.name or 0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_shallow_copy
does not guarantee that _data is copied, so a test fails. to_frame
should get a copy, else smething like this could happen:
>>> ci = pd.CategoricalIndex(['a', 'b', 'c'])
>>> df = ci.to_frame()
>>> df.iloc[0] = 'c'
>>> df
0
c c # index label changed!
b b
c c
By using self.values.copy()
rather than _shallow_copy
, this issue is avoided.
doc/source/whatsnew/v0.23.2.txt
Outdated
- Improved performance of membership checks in :class:`CategoricalIndex` | ||
(i.e. ``x in ci``-style checks are much faster). :meth:`CategoricalIndex.contains` | ||
- Improved performance of membership checks in :class:`CategoricalIndex` and :class:`Categorical` | ||
(i.e. ``x in cat``-style checks are much faster). :meth:`CategoricalIndex.contains` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a cleanup of the whatsnew for #21508, not related to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put this in a separate PR then.
df8a333
to
c32844f
Compare
Codecov Report
@@ Coverage Diff @@
## master #21618 +/- ##
==========================================
+ Coverage 91.9% 91.9% +<.01%
==========================================
Files 153 153
Lines 49547 49548 +1
==========================================
+ Hits 45537 45538 +1
Misses 4010 4010
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fine, pls do the 0.23.2 change separately though.
doc/source/whatsnew/v0.23.2.txt
Outdated
- Improved performance of membership checks in :class:`CategoricalIndex` | ||
(i.e. ``x in ci``-style checks are much faster). :meth:`CategoricalIndex.contains` | ||
- Improved performance of membership checks in :class:`CategoricalIndex` and :class:`Categorical` | ||
(i.e. ``x in cat``-style checks are much faster). :meth:`CategoricalIndex.contains` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put this in a separate PR then.
c32844f
to
fa378e2
Compare
Comments addressed & green |
thanks @topper-123 |
Currently,
CategoricalIndex._create_categorical
is a staticmethod, and is being called internally using either instances or classes as its first argument, e.g.:_is_dtype_compat
an instance is supplied as the first argument,__new__
a class is supplied as the first argument.This is confusing and makes the code paths different depending on how the method is called. It makes it difficult to reason about the precise output of the method.
This PR cleans this up by making
_create_categorical
a classmethod. This simplifies stuff, and we can also remove the method's first parameter when calling it.Calling
_create_categorical
unneccesarily is one reason for the slowness of #20395. After this cleanup PR I will do another that should get #20395 come down to 1.6 ms as well as give some other related performance improvements.